一种基于贝叶斯分类的XML检索文档相似度算法

doi:10.3969/j.issn.1006-2475.2012.01.009

计算机与现代化 ›› 2012, Vol. 1 ›› Issue (1): 34-36,8.doi: 10.3969/j.issn.1006-2475.2012.01.009

一种基于贝叶斯分类的XML检索文档相似度算法

韩晓梅，郑洪源，丁秋林

南京航空航天大学计算机科学与技术学院，江苏南京 210016

收稿日期:2011-09-09 修回日期:1900-01-01 出版日期:2012-01-10 发布日期:2012-01-10

An XML Retrieval Document Similarity Algorithm Based on Bayesian Classifier

HAN Xiao-mei, ZHENG Hong-yuan, DING Qiu-lin

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Received:2011-09-09 Revised:1900-01-01 Online:2012-01-10 Published:2012-01-10

摘要/Abstract

摘要： 目前对于查询相似度的计算通常是从比对检索结果与查询式的相似度来考虑。本文提出一种基于贝叶斯分类的算法来计算XML查询结果相似度。在计算出每个检索结果文档与查询式相似度的基础上，使用贝叶斯分类器将XML检索文档分类成相关与不相关两个集合，再由计算相关文档与不相关文档的相似度来决定最终的相似度值。最后，通过实验分析表明，在不影响查全率的前提下，这样得到的相似度计算精度比传统方法高15%左右，有效地提高了检索性能。

关键词: 贝叶斯分类, 查询相似度, XML检索文档, 信息检索

Abstract: At present, the similarity calculation for inquires is usually considered by comparing retrieval results to inquires. This paper proposes an algorithm based on Bayesian classifier to calculate the similarity of XML search results. On the basis of working out similarity of each document and inquire, it divides XML retrieval documents into relevant sets and uncorrelated sets by using Bayesian classifier. Then, final similarity is obtained by calculating the similarity of relevant documents and uncorrelated documents. At last, the experimental analysis shows that the new algorithm improves the retrieval performance effectively about 15 percent higher than traditional method without affecting recall ratio.

Key words: Bayesian classifier, inquire similarity, XML retrieval document, information retrieval

中图分类号:

TP391

韩晓梅;郑洪源;丁秋林. 一种基于贝叶斯分类的XML检索文档相似度算法[J]. 计算机与现代化, 2012, 1(1): 34-36,8.

HAN Xiao-mei;ZHENG Hong-yuan;DING Qiu-lin. An XML Retrieval Document Similarity Algorithm Based on Bayesian Classifier[J]. Computer and Modernization, 2012, 1(1): 34-36,8.

[1]	王镇宇, 郑扬飞. 基于排序学习算法的智能检索系统[J]. 计算机与现代化, 2021, 0(10): 35-40.
[2]	李霄野，李春生，李龙,张可佳. 基于LDA模型的文本聚类检索[J]. 计算机与现代化, 2018, 0(06): 7-.
[3]	尹积栋1，刘红1，彭崧1，张慧2. 一种信息检索系统的设计与实现[J]. 计算机与现代化, 2017, 0(5): 123-126.
[4]	孙梦，瞿有利. 一种基于噪音过滤包装器的Web新闻抽取方法[J]. 计算机与现代化, 2017, 0(1): 5-12.
[5]	路金泉，徐开勇，戴乐育. 基于文本过滤的贝叶斯分类算法的改进[J]. 计算机与现代化, 2016, 0(9): 100-103+108.
[6]	宋文灏1，钟浩2，于海波1. 一种有效的API搜索算法[J]. 计算机与现代化, 2016, 0(4): 59-64.
[7]	柳萌宇1，钟浩2，于海波1. 基于变更相似性的跨语言克隆检测方法[J]. 计算机与现代化, 2016, 0(4): 79-84+99.
[8]	吕飞. 一种高效的源代码搜索算法[J]. 计算机与现代化, 2015, 0(3): 9-14.
[9]	田永昌1，李颖2. 基于兴趣模型的查询扩展[J]. 计算机与现代化, 2014, 0(7): 36-39.
[10]	陈曦;薛广涛. 一种基于朴素贝叶斯分类的3G用户流量预测技术[J]. 计算机与现代化, 2013, 1(4): 153-157,.
[11]	杜晶,陈群,刘海龙 . 一种基于遗传算法的查询关键词形成技术[J]. 计算机与现代化, 2013, 12(12): 5-8.
[12]	施询之;孙宁远;李骋罡. 基于微博信息库和文本分词的人机对话模型设计[J]. 计算机与现代化, 2013, 1(11): 207-209.
[13]	孙文慧;魏幼平. 基于Xapian和PHP的高性能站内搜索系统方案设计[J]. 计算机与现代化, 2012, 1(200): 76-03.
[14]	黄名选;冯平;谢统义. 基于频繁项集挖掘与查询扩展的信息检索系统模型[J]. 计算机与现代化, 2012, 1(200): 53-03.
[15]	赵爽;林永民. 基于领域本体的贝叶斯网络检索模型研究[J]. 计算机与现代化, 2012, 1(03): 103-105.

一种基于贝叶斯分类的XML检索文档相似度算法

An XML Retrieval Document Similarity Algorithm Based on Bayesian Classifier

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价